home *** CD-ROM | disk | FTP | other *** search
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- NNAAMMEE
- ispell - format of ispell dictionaries and affix files
-
- DDEESSCCRRIIPPTTIIOONN
- _I_s_p_e_l_l(1) requires two files to define the language that
- it is spell-checking. The first file is a dictionary con-
- taining words for the language, and the second is an
- "affix" file that defines the meaning of special flags in
- the dictionary. The two files are combined by _b_u_i_l_d_h_a_s_h
- (see _i_s_p_e_l_l(1)) and written to a hash file which is not
- described here.
-
- A raw _i_s_p_e_l_l dictionary (either the main dictionary or
- your own personal dictionary) contains a list of words,
- one per line. Each word may optionally be followed by a
- slash ("/") and one or more flags, which modify the root
- word as explained below. Depending on the options with
- which _i_s_p_e_l_l was built, case may or may not be significant
- in either the root word or the flags, independently.
- Specifically, if the compile-time option CAPITALIZATION is
- defined, case is significant in the root word; if not,
- case is ignored in the root word. If the compile-time
- option MASKBITS is set to a value of 32, case is ignored
- in the flags; otherwise case is significant in the flags.
- Contact your system administrator or _i_s_p_e_l_l maintainer for
- more information (or use the --vvvv flag to find out). The
- dictionary should be sorted with the --ff flag of _s_o_r_t(1)
- before the hash file is built; this is done automatically
- by _m_u_n_c_h_l_i_s_t(1), which is the normal way of building dic-
- tionaries.
-
- If the dictionary contains words that have string charac-
- ters (see the affix-file documentation below), they must
- be written in the format given by the ddeeffssttrriinnggttyyppee state-
- ment in the affix file. This will be the case for most
- non-English languages. Be careful to use this format,
- rather than that of your favorite formatter, when adding
- words to a dictionary. (If you add words to your personal
- dictionary during an _i_s_p_e_l_l session, they will automati-
- cally be converted to the correct format. This feature
- can be used to convert an entire dictionary if necessary:)
-
- echo qqqqq > dummy.dict
- buildhash dummy.dict _a_f_f_i_x_-_f_i_l_e dummy.hash
- awk '{print "*"}END{print "#"}' _o_l_d_-_d_i_c_t_-_f_i_l_e \
- | ispell -a -T _o_l_d_-_d_i_c_t_-_s_t_r_i_n_g_-_t_y_p_e \
- -d ./dummy.hash -p ./_n_e_w_-_d_i_c_t_-_f_i_l_e \
- > /dev/null
- rm dummy.*
-
- The case of the root word controls the case of words
- accepted by _i_s_p_e_l_l, as follows:
-
- (1) If the root word appears only in lower case (e.g.,
-
-
-
- local 1
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- _b_o_b), it will be accepted in lower case, capital-
- ized, or all capitals.
-
- (2) If the root word appears capitalized (e.g.,
- _R_o_b_e_r_t), it will not be accepted in all-lower case,
- but will be accepted capitalized or all in capi-
- tals.
-
- (3) If the root word appears all in capitals (e.g.,
- _U_N_I_X), it will only be accepted all in capitals.
-
- (4) If the root word appears with a "funny" capitaliza-
- tion (e.g., _I_T_C_o_r_p), a word will be accepted only
- if it follows that capitalization, or if it appears
- all in capitals.
-
- (5) More than one capitalization of a root word may
- appear in the dictionary. Flags from different
- capitalizations are combined by OR-ing them
- together.
-
- Redundant capitalizations (e.g., _b_o_b and _B_o_b) will be com-
- bined by _b_u_i_l_d_h_a_s_h and by _i_s_p_e_l_l (for personal dictionar-
- ies), and can be removed from a raw dictionary by _m_u_n_c_h_-
- _l_i_s_t.
-
- For example, the dictionary:
-
- bob
- Robert
- UNIX
- ITcorp
- ITCorp
-
- will accept _b_o_b, _B_o_b, _B_O_B, _R_o_b_e_r_t, _R_O_B_E_R_T, _U_N_I_X, _I_T_c_o_r_p,
- _I_T_C_o_r_p, and _I_T_C_O_R_P, and will reject all others. Some of
- the unacceptable forms are _b_O_b, _r_o_b_e_r_t, _U_n_i_x, and _I_t_C_o_r_p.
-
- As mentioned above, root words in any dictionary may be
- extended by flags. Each flag is a single alphabetic char-
- acter, which represents a prefix or suffix that may be
- added to the root to form a new word. For example, in an
- English dictionary the DD flag can be added to _b_a_t_h_e to
- make _b_a_t_h_e_d. Since flags are represented as a single bit
- in the hashed dictionary, this results in significant
- space savings. The _m_u_n_c_h_l_i_s_t script will reduce an exist-
- ing raw dictionary by adding flags when possible.
-
- When a word is extended with an affix, the affix will be
- accepted only if it appears in the same case as the ini-
- tial (prefix) or final (suffix) letter of the word. Thus,
- for example, the entry _U_N_I_X_/_M in the main dictionary (MM
- means add an apostrophe and an "s" to make a possessive)
- would accept _U_N_I_X_'_S but would reject _U_N_I_X_'_s. If _U_N_I_X_'_s is
-
-
-
- local 2
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- legal, it must appear as a separate dictionary entry, and
- it will not be combined by _m_u_n_c_h_l_i_s_t. (In general, you
- don't need to worry about these things; _m_u_n_c_h_l_i_s_t guaran-
- tees that its output dictionary will accept the same set
- of words as its input, so all you have to do is add words
- to the dictionary and occasionally run munchlist to reduce
- its size).
-
- As mentioned, the affix definition file describes the
- affixes associated with particular flags. It also
- describes the character set used by the language.
-
- Although the affix-definition grammar is designed for a
- line-oriented layout, it is actually a free-format yacc
- grammar and can be laid out weirdly if you want. Comments
- are started by a pound (sharp) sign (#), and continue to
- the end of the line. Backslashes are supported in the
- usual fashion (\\_n_n_n, plus specials \\nn, \\rr, \\tt, \\vv, \\ff, \\bb,
- and the new hex format \\xx_n_n). Any character with special
- meaning to the parser can be changed to an uninterpreted
- token by backslashing it; for example, you can declare a
- flag named _f_l_a_g _\_*_: or _f_l_a_g _\_:_:.
-
- The grammar will be presented in a top-down fashion, with
- discussion of each element. An affix-definition file must
- contain exactly one table:
-
- _t_a_b_l_e : [_h_e_a_d_e_r_s] [_p_r_e_f_i_x_e_s] [_s_u_f_f_i_x_e_s]
-
- At least one of _p_r_e_f_i_x_e_s and _s_u_f_f_i_x_e_s is required. They
- can appear in either order.
-
- _h_e_a_d_e_r_s : [ _o_p_t_i_o_n_s ] _c_h_a_r_-_s_e_t_s
-
- The headers describe options global to this dictionary and
- language. These include the character sets to be used and
- the formatter, and the defaults for certain _i_s_p_e_l_l flags.
-
- _o_p_t_i_o_n_s : { _f_m_t_r_-_s_t_m_t | _o_p_t_-_s_t_m_t | _f_l_a_g_-_s_t_m_t | _n_u_m_-_s_t_m_t }
-
- The options statements define the defaults for certain
- ispell flags and for the character sets used by the for-
- matters.
-
- _f_m_t_r_-_s_t_m_t : { _n_r_o_f_f_-_s_t_m_t | _t_e_x_-_s_t_m_t }
-
- A _f_m_t_r_-_s_t_m_t describes characters that have special meaning
- to a formatter. Normally, this statement is not neces-
- sary, but some languages may have preempted the usual
- defaults for use as language-specific characters. In this
- case, these statements may be used to redefine the special
- characters expected by the formatter.
-
- _n_r_o_f_f_-_s_t_m_t : { nnrrooffffcchhaarrss | ttrrooffffcchhaarrss } _s_t_r_i_n_g
-
-
-
- local 3
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- The nnrrooffffcchhaarrss statement allows redefinition of certain
- _n_r_o_f_f control characters. The string given must be
- exactly five characters long, and must list substitutions
- for the left and right parentheses ("()") , the period
- ("."), the backslash ("\"), and the asterisk ("*"). (The
- right parenthesis is not currently used, but is included
- for completeness.) For example, the statement:
-
- nnrrooffffcchhaarrss {}.\\*
-
- would replace the left and right parentheses with left and
- right curly braces for purposes of parsing _n_r_o_f_f/_t_r_o_f_f
- strings, with no effect on the others (admittedly a con-
- trived example). Note that the backslash is escaped with
- a backslash.
-
- _t_e_x_-_s_t_m_t : { TTeeXXcchhaarrss | tteexxcchhaarrss } _s_t_r_i_n_g
-
- The TTeeXXcchhaarrss statement allows redefinition of certain
- TeX/LaTeX control characters. The string given must be
- exactly thirteen characters long, and must list substitu-
- tions for the left and right parentheses ("()") , the left
- and right square brackets ("[]"), the left and right curly
- braces ("{}"), the left and right angle brackets ("<>"),
- the backslash ("\"), the dollar sign ("$"), the asterisk
- ("*"), the period or dot ("."), and the percent sign
- ("%"). For example, the statement:
-
- tteexxcchhaarrss ()\[]<\><\>\\$*.%
-
- would replace the functions of the left and right curly
- braces with the left and right angle brackets for purposes
- of parsing TeX/LaTeX constructs, while retaining their
- functions for the _t_i_b bibliographic preprocessor. Note
- that the backslash, the left square bracket, and the right
- angle bracket must be escaped with a backslash.
-
- _o_p_t_-_s_t_m_t : { _c_m_p_n_d_-_s_t_m_t | _a_f_f_-_s_t_m_t }
-
- _c_m_p_n_d_-_s_t_m_t : ccoommppoouunnddwwoorrddss _c_o_m_p_o_u_n_d_-_o_p_t
-
- _a_f_f_-_s_t_m_t : aallllaaffffiixxeess _o_n_-_o_r_-_o_f_f
-
- _o_n_-_o_r_-_o_f_f : { oonn | ooffff }
-
- _c_o_m_p_o_u_n_d_-_o_p_t : { _o_n_-_o_r_-_o_f_f | ccoonnttrroolllleedd _c_h_a_r_a_c_t_e_r }
-
- An _o_p_t_-_s_t_m_t controls certain ispell defaults that are best
- made language-specific. The aallllaaffffiixxeess statement controls
- the default for the --PP and --mm options to _i_s_p_e_l_l_. If
- aallllaaffffiixxeess is turned ooffff (the default), _i_s_p_e_l_l will
- default to the behavior of the _-_P flag: root/affix sugges-
- tions will only be made if there are no "near misses". If
- aallllaaffffiixxeess is turned oonn, _i_s_p_e_l_l will default to the
-
-
-
- local 4
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- behavior of the _-_m flag: root/affix suggestions will
- always be made. The ccoommppoouunnddwwoorrddss statement controls the
- default for the --BB and --CC options to _i_s_p_e_l_l_. If ccoommppoouunndd--
- wwoorrddss is turned ooffff (the default), _i_s_p_e_l_l will default to
- the behavior of the _-_B flag: run-together words will be
- reported as errors. If ccoommppoouunnddwwoorrddss is turned oonn, _i_s_p_e_l_l
- will default to the behavior of the _-_C flag: run-together
- words will be considered as compounds if both are in the
- dictionary. This is useful for languages such as German
- and Norwegian, which form large numbers of compound words.
- Finally, if ccoommppoouunnddwwoorrddss is set to _c_o_n_t_r_o_l_l_e_d, only words
- marked with the flag indicated by _c_h_a_r_a_c_t_e_r (which should
- not be otherwise used) will be allowed to participate in
- compound formation. Because this option requires the
- flags to be specified in the dictionary, it is not avail-
- able from the command line.
-
- _f_l_a_g_-_s_t_m_t : ffllaaggmmaarrkkeerr _c_h_a_r_a_c_t_e_r
-
- The ffllaaggmmaarrkkeerr statement describes the character which is
- used to separate affix flags from the root word in a raw
- dictionary file. This must be a character which is not
- found in any word (including in string characters; see
- below). The default is "/" because this character is not
- normally used to represent special characters in any lan-
- guage.
-
- _n_u_m_-_s_t_m_t : ccoommppoouunnddmmiinn _d_i_g_i_t
-
- The ccoommppoouunnddmmiinn statement controls the length of the two
- components of a compound word. This only has an effect if
- ccoommppoouunnddwwoorrddss is turned oonn or if the --CC flag is given to
- _i_s_p_e_l_l. In that case, only words at least as long as the
- given minimum will be accepted as components of a com-
- pound. The default is 3 characters.
-
- _c_h_a_r_-_s_e_t_s : _n_o_r_m_-_s_e_t_s [ _a_l_t_-_s_e_t_s ]
-
- The character-set section describes the characters that
- can be part of a word, and defines their collating order.
- There must always be a definition of "normal" character
- sets; in addition, there may be one or more partial defi-
- nitions of "alternate" sets which are used with various
- text formatters.
-
- _n_o_r_m_-_s_e_t_s : [ _d_e_f_t_y_p_e ] charset-group
-
- A "normal" character set may optionally begin with a defi-
- nition of the file suffixes that make use of this set.
- Following this are one or more character-set declarations.
-
- _d_e_f_t_y_p_e : ddeeffssttrriinnggttyyppee _n_a_m_e _d_e_f_o_r_m_a_t_t_e_r _s_u_f_f_i_x*
-
- The ddeeffssttrriinnggttyyppee declaration gives a list of file
-
-
-
- local 5
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- suffixes which should make use of the default string char-
- acters defined as part of the base character set; it is
- only necessary if string characters are being defined.
- The _n_a_m_e parameter is a string giving the unique name
- associated with these suffixes; often it is a formatter
- name. If the formatter is a member of the troff family,
- "nroff" should be used for the name associated with the
- most popular macro package; members of the TeX family
- should use "tex". Other names may be chosen freely, but
- they should be kept simple, as they are used in _i_s_p_e_l_l _'_s
- --TT switch to specify a formatter type. The _d_e_f_o_r_m_a_t_t_e_r
- parameter specifies the deformatting style to use when
- processing files with the given suffixes. Currently, this
- must be either tteexx or nnrrooffff. The _s_u_f_f_i_x parameters are a
- whitespace-separated list of strings which, if present at
- the end of a filename, indicate that the associated set of
- string characters should be used by default for this file.
- For example, the suffix list for the troff family typi-
- cally includes suffixes such as ".ms", ".me", ".mm", etc.
-
- _c_h_a_r_s_e_t_-_g_r_o_u_p : { _c_h_a_r_-_s_t_m_t | _s_t_r_i_n_g_-_s_t_m_t | _d_u_p_-_s_t_m_t}*
-
- A _c_h_a_r_-_s_t_m_t describes single characters; a _s_t_r_i_n_g_-_s_t_m_t
- describes characters that must appear together as a
- string, and which usually represent a single character in
- the target language. Either may also describe conversion
- between upper and lower case. A _d_u_p_-_s_t_m_t is used to
- describe alternate forms of string characters, so that a
- single dictionary may be used with several formatting pro-
- grams that use different conventions for representing non-
- ASCII characters.
-
- _c_h_a_r_-_s_t_m_t : wwoorrddcchhaarrss _c_h_a_r_a_c_t_e_r_-_r_a_n_g_e
- | wwoorrddcchhaarrss _l_o_w_e_r_c_a_s_e_-_r_a_n_g_e _u_p_p_e_r_c_a_s_e_-_r_a_n_g_e
- | bboouunnddaarryycchhaarrss _c_h_a_r_a_c_t_e_r_-_r_a_n_g_e
- | bboouunnddaarryycchhaarrss _l_o_w_e_r_c_a_s_e_-_r_a_n_g_e _u_p_p_e_r_c_a_s_e_-_r_a_n_g_e
- _s_t_r_i_n_g_-_s_t_m_t : ssttrriinnggcchhaarr _s_t_r_i_n_g
- | ssttrriinnggcchhaarr _l_o_w_e_r_c_a_s_e_-_s_t_r_i_n_g _u_p_p_e_r_c_a_s_e_-_s_t_r_i_n_g
-
- Characters described with the bboouunnddaarryycchhaarrss statement are
- considered part of a word only if they appear singly,
- embedded between characters declared with the wwoorrddcchhaarrss or
- ssttrriinnggcchhaarr statements. For example, if the hyphen is a
- boundary character (useful in French), the string "foo-
- bar" would be a single word, but "-foo" would be the same
- as "foo", and "foo--bar" would be two words separated by
- non-word characters.
-
- If two ranges or strings are given in a _c_h_a_r_-_s_t_m_t or
- _s_t_r_i_n_g_-_s_t_m_t, the first describes characters that are
- interpreted as lowercase and the second describes upper-
- case. In the case of a ssttrriinnggcchhaarr statement, the two
- strings must be of the same length. Also, in a ssttrriinnggcchhaarr
- statement, the actual strings may contain both uppercase
-
-
-
- local 6
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- and characters themselves without difficulty; for
- instance, the statement
-
- stringchar "\\*(sS" "\\*(Ss"
-
- is legal and will not interfere with (or be interfered
- with by) other declarations of of "s" and "S" as lower and
- upper case, respectively.
-
- A final note on string characters: some languages collate
- certain special characters as if they were strings. For
- example, the German "a-umlaut" is traditionally sorted as
- if it were "ae". Ispell is not capable of this; each
- character must be treated as an individual entity. So in
- certain cases, ispell will sort a list of words into a
- different order than the standard "dictionary" order for
- the target language.
-
- _a_l_t_-_s_e_t_s : _a_l_t_t_y_p_e [ _a_l_t_-_s_t_m_t* ]
-
- Because different formatters use different notations to
- represent non-ASCII characters, _i_s_p_e_l_l must be aware of
- the representations used by these formatters. These are
- declared as alternate sets of string characters.
-
- _a_l_t_t_y_p_e : aallttssttrriinnggttyyppee _n_a_m_e _s_u_f_f_i_x*
-
- The aallttssttrriinnggttyyppee statement introduces each set by declar-
- ing the associated formatter name and filename suffix
- list. This name and list are interpreted exactly as in
- the ddeeffssttrriinnggttyyppee statement above. Following this header
- are one or more _a_l_t_-_s_t_m_ts which declare the alternate
- string characters used by this formatter.
-
- _a_l_t_-_s_t_m_t : aallttssttrriinnggcchhaarr _a_l_t_-_s_t_r_i_n_g _s_t_d_-_s_t_r_i_n_g
-
- The _a_l_t_s_t_r_i_n_g_c_h_a_r statement describes alternate represen-
- tations for string characters. For example, the -mm macro
- package of _t_r_o_f_f represents the German "a-umlaut" as _a_\_*_:,
- while _T_e_X uses the sequence _\_"_a. If the _t_r_o_f_f versions
- are declared as the standard versions using ssttrriinnggcchhaarr,
- the _T_e_X versions may be declared as alternates by using
- the statement
-
- altstringchar \\\"a a\\*:
-
- When the aallttssttrriinnggcchhaarr statement is used to specify alter-
- nate forms, all forms for a particular formatter must be
- declared together as a group. Also, each formatter or
- macro package must provide a complete set of characters,
- both upper- and lower-case, and the character sequences
- used for each formatter must be completely distinct.
- Character sequences which describe upper- and lower-case
- versions of the same printable character must also be the
-
-
-
- local 7
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- same length. It may be necessary to define some new
- macros for a given formatter to satisfy these restric-
- tions. (The current version of _b_u_i_l_d_h_a_s_h does not enforce
- these restrictions, but failure to obey them may result in
- errors being introduced into files that are processed with
- _i_s_p_e_l_l.)
-
- An important minor point is that _i_s_p_e_l_l assumes that all
- characters declared as wwoorrddcchhaarrss or bboouunnddaarryycchhaarrss will
- occupy exactly one position on the terminal screen.
-
- A single character-set statement can declare either a sin-
- gle character or a contiguous range of characters. A
- range is given as in egrep and the shell: [a-z] means low-
- ercase alphabetics; [^a-z] means all but lowercase, etc.
- All character-set statements are combined (unioned) to
- produce the final list of characters that may be part of a
- word. The collating order of the characters is defined by
- the order of their declaration; if a range is used, the
- characters are considered to have been declared in ASCII
- order. Characters that have case are collated next to
- each other, with the uppercase character first.
-
- The character-declaration statements have a rather strange
- behavior caused by its need to match each lowercase char-
- acter with its uppercase equivalent. In any given wwoorrdd--
- cchhaarrss or bboouunnddaarryycchhaarrss statement, the characters in each
- range are first sorted into ASCII collating sequence, then
- matched one-for-one with the other range. (The two ranges
- must have the same number of characters). Thus, for exam-
- ple, the two statements:
-
- wwoorrddcchhaarrss [aeiou] [AEIOU]
- wwoorrddcchhaarrss [aeiou] [UOIEA]
-
- would produce exactly the same effect. To get the vowels
- to match up "wrong", you would have to use separate state-
- ments:
-
- wwoorrddcchhaarrss a U
- wwoorrddcchhaarrss e O
- wwoorrddcchhaarrss i I
- wwoorrddcchhaarrss o E
- wwoorrddcchhaarrss u A
-
- which would cause uppercase 'e' to be 'O', and lowercase
- 'O' to be 'e'. This should normally be a problem only
- with languages which have been forced to use a strange
- ASCII collating sequence. If your uppercase and lowercase
- letters both collate in the same order, you shouldn't have
- to worry about this "feature".
-
- The prefixes and suffixes sections have exactly the same
- syntax, except for the introductory keyword.
-
-
-
- local 8
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- _p_r_e_f_i_x_e_s : pprreeffiixxeess _f_l_a_g_d_e_f*
- _s_u_f_f_i_x_e_s : ssuuffffiixxeess _f_l_a_g_d_e_f*
- _f_l_a_g_d_e_f : ffllaagg [**|~~] _c_h_a_r :: _r_e_p_l*
-
- A prefix or suffix table consists of an introductory key-
- word and a list of flag definitions. Flags can be defined
- more than once, in which case the definitions are com-
- bined. Each flag controls one or more _r_e_p_ls (replace-
- ments) which are conditionally applied to the beginnings
- or endings of various words.
-
- Flags are named by a single character _c_h_a_r. Depending on
- a configuration option, this character can be either any
- uppercase letter (the default configuration) or any 7-bit
- ASCII character. Most languages should be able to get
- along with just 26 flags.
-
- A flag character may be prefixed with one or more option
- characters. (If you wish to use one of the option charac-
- ters as a flag character, simply enclose it in double
- quotes.)
-
- The asterisk (**) option means that this flag participates
- in _c_r_o_s_s_-_p_r_o_d_u_c_t formation. This only matters if the file
- contains both prefix and suffix tables. If so, all pre-
- fixes and suffixes marked with an asterisk will be applied
- in all cross-combinations to the root word. For example,
- consider the root _f_i_x with prefixes _p_r_e and _i_n, and suf-
- fixes _e_s and _e_d. If all flags controlling these prefixes
- and suffixes are marked with an asterisk, then the single
- root _f_i_x would also generate _p_r_e_f_i_x, _p_r_e_f_i_x_e_s, _p_r_e_f_i_x_e_d,
- _i_n_f_i_x, _i_n_f_i_x_e_s, _i_n_f_i_x_e_d, _f_i_x, _f_i_x_e_s, and _f_i_x_e_d. Cross-
- product formation can produce a large number of words
- quickly, some of which may be illegal, so watch out. If
- cross-products produce illegal words, _m_u_n_c_h_l_i_s_t will not
- produce those flag combinations, and the flag will not be
- useful.
-
- _r_e_p_l : _c_o_n_d_i_t_i_o_n* >> [ -- _s_t_r_i_p_-_s_t_r_i_n_g ,, ] _a_p_p_e_n_d_-_s_t_r_i_n_g
-
- The ~~ option specifies that the associated flag is only
- active when a compound word is being formed. This is use-
- ful in a language like German, where the form of a word
- sometimes changes inside a compound.
-
- A _r_e_p_l is a conditional rule for modifying a root word.
- Up to 8 _c_o_n_d_i_t_i_o_n_s may be specified. If the _c_o_n_d_i_t_i_o_n_s
- are satisfied, the rules on the right-hand side of the
- _r_e_p_l are applied, as follows:
-
- (1) If a strip-string is given, it is first stripped
- from the beginning or ending (as appropriate) of
- the root word.
-
-
-
-
- local 9
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- (2) Then the append-string is added at that point.
-
- For example, the _c_o_n_d_i_t_i_o_n .. means "any word", and the
- _c_o_n_d_i_t_i_o_n YY means "any word ending in Y". The following
- (suffix) replacements:
-
- . > MENT
- Y > -Y,IES
-
- would change _i_n_d_u_c_e to _i_n_d_u_c_e_m_e_n_t and _f_l_y to _f_l_i_e_s. (If
- they were controlled by the same flag, they would also
- change _f_l_y to _f_l_y_m_e_n_t, which might not be what was wanted.
- _M_u_n_c_h_l_i_s_t can be used to protect against this sort of
- problem; see the command sequence given below.)
-
- No matter how much you might wish it, the strings on the
- right must be strings of specific characters, not ranges.
- The reasons are rooted deeply in the way _i_s_p_e_l_l works, and
- it would be difficult or impossible to provide for more
- flexibility. For example, you might wish to write:
-
- [EY] > -[EY],IES
-
- This will not work. Instead, you must use two separate
- rules:
-
- E > -E,IES
- Y > -Y,IES
-
- The application of _r_e_p_ls can be restricted to certain
- words with _c_o_n_d_i_t_i_o_n_s:
-
- _c_o_n_d_i_t_i_o_n : { .. | _c_h_a_r_a_c_t_e_r | _r_a_n_g_e }
-
- A _c_o_n_d_i_t_i_o_n is a restriction on the characters that
- adjoin, and/or are replaced by, the right-hand side of the
- _r_e_p_l. Up to 8 _c_o_n_d_i_t_i_o_n_s may be given, which should be
- enough context for anyone. The right-hand side will be
- applied only if the _c_o_n_d_i_t_i_o_n_s in the _r_e_p_l are satisfied.
- The _c_o_n_d_i_t_i_o_n_s also implicitly define a length; roots
- shorter than the number of _c_o_n_d_i_t_i_o_n_s will not pass the
- test. (As a special case, a _c_o_n_d_i_t_i_o_n of a single dot "."
- defines a length of zero, so that the rule applies to all
- words indiscriminately). This length is independent of
- the separate test that insists that all flags produce an
- output word length of at least four.
-
- _C_o_n_d_i_t_i_o_n_s that are single characters should be separated
- by white space. For example, to specify words ending in
- "ED", write:
-
- E D > -ED,ING # As in covered > covering
-
- If you write:
-
-
-
- local 10
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- ED > -ED,ING
-
- the effect will be the same as:
-
- [ED] > -ED,ING
-
- As a final minor, but important point, it is sometimes
- useful to rebuild a dictionary file using an incompatible
- suffix file. For example, suppose you expanded the "R"
- flag to generate "er" and "ers" (thus making the Z flag
- somewhat obsolete). To build a new dictionary _n_e_w_d_i_c_t
- that, using _n_e_w_a_f_f_i_x_e_s, will accept exactly the same list
- of words as the old list _o_l_d_d_i_c_t did using _o_l_d_a_f_f_i_x_e_s, the
- --cc switch of _m_u_n_c_h_l_i_s_t is useful, as in the following
- example:
-
- $ munchlist -c oldaffixes -l newaffixes olddict > newdict
-
- If you use this procedure, your new dictionary will always
- accept the same list the original did, even if you badly
- screwed up the affix file. This is because _m_u_n_c_h_l_i_s_t com-
- pares the words generated by a flag with the original word
- list, and refuses to use any flags that generate illegal
- words. (But don't forget that the _m_u_n_c_h_l_i_s_t step takes a
- long time and eats up temporary file space).
-
- EEXXAAMMPPLLEESS
- As an example of conditional suffixes, here is the speci-
- fication of the SS flag from the English affix file:
-
- flag *S:
- [^AEIOU]Y > -Y,IES # As in imply > implies
- [AEIOU]Y > S # As in convey > conveys
- [SXZH] > ES # As in fix > fixes
- [^SXZHY] > S # As in bat > bats
-
- The first line applies to words ending in Y, but not in
- vowel-Y. The second takes care of the vowel-Y words. The
- third then handles those words that end in a sibilant or
- near-sibilant, and the last picks up everything else.
-
- Note that the _c_o_n_d_i_t_i_o_n_s are written very carefully so
- that they apply to disjoint sets of words. In particular,
- note that the fourth line excludes words ending in Y as
- well as the obvious SXZH. Otherwise, it would convert
- "imply" into "implys".
-
- Although the English affix file does not do so, you can
- also have a flag generate more than one variation on a
- root word. For example, we could extend the English "R"
- flag as follows:
-
- flag *R:
- E > R # As in skate > skater
-
-
-
- local 11
-
-
-
-
-
- ISPELL(4) ISPELL(4)
-
-
- E > RS # As in skate > skaters
- [^AEIOU]Y > -Y,IER # As in multiply > multiplier
- [^AEIOU]Y > -Y,IERS # As in multiply > multipliers
- [AEIOU]Y > ER # As in convey > conveyer
- [AEIOU]Y > ERS # As in convey > conveyers
- [^EY] > ER # As in build > builder
- [^EY] > ERS # As in build > builders
-
- This flag would generate both "skater" and "skaters" from
- "skate". This capability can be very useful in languages
- that make use of noun, verb, and adjective endings. For
- instance, one could define a single flag that generated
- all of the German "weak" verb endings.
-
- SSEEEE AALLSSOO
- ispell(1)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- local 12
-
-
-